Practical Hardening of Crash-Tolerant Systems
نویسندگان
چکیده
Recent failures of production systems have highlighted the importance of tolerating faults beyond crashes. The industry has so far addressed this problem by hardening crash-tolerant systems with ad hoc error detection checks, potentially overlooking critical fault scenarios. We propose a generic and principled hardening technique for Arbitrary State Corruption (ASC) faults, which specifically model the effects of realistic data corruptions on distributed processes. Hardening does not require the use of trusted components or the replication of the process over multiple physical servers. We implemented a wrapper library to transparently harden distributed processes. To exercise our library and evaluate our technique, we obtained ASC-tolerant versions of Paxos, of a subset of the ZooKeeper API, and of an eventually consistent storage by implementing crash-tolerant protocols and automatically hardening them using our library. Our evaluation shows that the throughput of our ASC-hardened state machine replication outperforms its Byzantine-tolerant counterpart by up to 70%.
منابع مشابه
Recent Results on Fault-Tolerance Consensus in Message-Passing Networks
This paper surveys recent results on fault-tolerant consensus in message-passing networks. We focus on two categories of works: (i) new problem formulations (including input domain, fault model, network model...etc.), and (ii) practical applications. For the second part, we focus on Crash Fault-Tolerant (CFT) systems that use Paxos or Raft, and Byzantine Fault-Tolerant (BFT) systems. We also br...
متن کاملImplementing Adaptive Fault-Tolerant Services for Hybrid Faults
The two major approaches to building fault-tolerant services are commonly known as the Primary-Backup approach (PB) and the State-Machine approach (SM). PB can tolerate crash and omission faults and runs more economically than SM, but SM can tolerate more serious faults, including arbitrary or Byzantine faults. Instead of selecting one or the other approach, thus either incurring a high running...
متن کاملMaking Distributed Applications Robust
We present a novel translation of systems that are tolerant of crash failures to systems that are tolerant of Byzantine failures in an asynchronous environment, making weaker assumptions than previous approaches. In particular, we assume little about how the application is coded. The translation exploits an extension of the Srikanth-Toueg protocol, supporting ordering in addition to authenticat...
متن کاملReliable Broadcast in a Computational Hybrid Model with Byzantine Faults, Crashes, and Recoveries
This paper presents a formal model for asynchronous distributed systems with parties that exhibit Byzantine faults or that crash and subsequently recover. Motivated by practical considerations, it represents an intermediate step between crash-recovery models for distributed computing and proactive security methods for tolerating arbitrary faults. The model is computational and based on complexi...
متن کاملSolution of Nonlinear Hardening and Softening type Oscillators by Adomian’s Decomposition Method
A type of nonlinearity in vibrational engineering systems emerges when the restoring force is a nonlinear function of displacement. The derivative of this function is known as stiffness. If the stiffness increases by increasing the value of displacement from the equilibrium position, then the system is known as hardening type oscillator and if the stiffness decreases by increasing the value of ...
متن کامل